Más allá de la búsqueda básica: abordando las limitaciones de la similitud semántica

Más allá de la similitud

El problema del 80 %ocurre cuando la búsqueda semántica básica funciona para consultas simples pero falla en casos extremos. Cuando buscamos solo por similitud, el almacén vectorial suele devolver los fragmentos más similares numéricamente. Sin embargo, si esos fragmentos son casi idénticos, el modelo de lenguaje recibe información redundante, desperdiciando la ventana de contexto limitada y perdiendo perspectivas más amplias.

Pilares avanzados de recuperación

Relevancia máxima marginal (MMR):En lugar de simplemente seleccionar los elementos más similares, MMR equilibra relevancia y diversidad para evitar redundancias. $MMR = \text{argmax}_{d \in R \setminus S} [\lambda \cdot \text{sim}(d, q) - (1 - \lambda) \cdot \max_{s \in S} \text{sim}(d, s)]$
Consulta autónoma:Utiliza el modelo de lenguaje para transformar el lenguaje natural en filtros estructurados de metadatos (por ejemplo, filtrar por "Tema 3" o "Fuente: PDF").
Compresión contextual:Reduce los documentos recuperados para extraer únicamente los fragmentos de alto valor relacionados con la consulta, ahorrando tokens.

La trampa de la redundancia

Proporcionar al modelo de lenguaje tres versiones del mismo párrafo no lo hace más inteligente; solo hace que la solicitud sea más costosa. La diversidad es clave para un contexto de alto valor.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Knowledge Check

You want your system to answer "What did the instructor say about probability in the third lecture?" specifically. Which tool allows the LLM to automatically apply a filter for { "source": "lecture3.pdf" }?

ConversationBufferMemory

Self-Querying Retriever

Contextual Compression

MapReduce Chain

Challenge: The Token Limit Dilemma

Apply advanced retrieval strategies to solve a real-world constraint.

You are building a RAG system for a legal firm. The documents retrieved are 50 pages long, but only 2 sentences per page are actually relevant to the user's specific query. The standard "Stuff" chain is throwing an OutOfTokens error because the context window is overflowing with irrelevant text.

Step 1

Identify the core problem and select the appropriate advanced retrieval tool to solve it without losing specific nuances.

Problem: The context window limit is being exceeded by "low-nutrient" text surrounding the relevant facts.

Tool Selection:ContextualCompressionRetriever

Step 2

What specific component must you use in conjunction with this retriever to "squeeze" the documents?

Solution: Use an LLMChainExtractor as the base for your compressor. This will process the retrieved documents and extract only the snippets relevant to the query, passing a much smaller, highly concentrated context to the final prompt.